{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "97c79c38-38a3-40f3-ba2e-250649347d63",
   "metadata": {},
   "source": [
    "# Multimodal Parsing using GPT4o-mini\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/parse/multimodal/gpt4o_mini.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "This cookbook shows you how to use LlamaParse to parse any document with the multimodal capabilities of GPT4o-mini.\n",
    "\n",
    "LlamaParse allows you to plug in external, multimodal model vendors for parsing - we handle the error correction, validation, and scalability/reliability for you.\n",
    "\n",
    "Status:\n",
    "| Last Executed | Version | State      |\n",
    "|---------------|---------|------------|\n",
    "| Aug-19-2025   | 0.6.61  | Maintained |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15e60ecf-519c-41fc-911b-765adaf8bad4",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "Download the data - the blog post from Meta on Llama3.1, in PDF form."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d9fb0aa-74cd-476f-8161-efd9e04248bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2025-08-20 09:01:29--  https://www.dropbox.com/scl/fi/8iu23epvv3473im5rq19g/llama3.1_blog.pdf?rlkey=5u417tbdox4aip33fdubvni56&st=dzozd11e&dl=1\n",
      "Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6016:18::a27d:112\n",
      "Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: https://uc29796f0b776076192093df7b2d.dl.dropboxusercontent.com/cd/0/inline/CvxiobAxsMsABs0DEDrx1mQ4P4l3JsmP2sR43DDeERGKF46mpTn7IFVWd4tKNsnH5ktPFJS_XYJG7jzY4B_-hCc9sXoVRVL74CYo95FjlLfLroFwdAtq-f00E7BrSfVABBwjXltHN2LtIXuyNWsRg0_t/file?dl=1# [following]\n",
      "--2025-08-20 09:01:29--  https://uc29796f0b776076192093df7b2d.dl.dropboxusercontent.com/cd/0/inline/CvxiobAxsMsABs0DEDrx1mQ4P4l3JsmP2sR43DDeERGKF46mpTn7IFVWd4tKNsnH5ktPFJS_XYJG7jzY4B_-hCc9sXoVRVL74CYo95FjlLfLroFwdAtq-f00E7BrSfVABBwjXltHN2LtIXuyNWsRg0_t/file?dl=1\n",
      "Resolving uc29796f0b776076192093df7b2d.dl.dropboxusercontent.com (uc29796f0b776076192093df7b2d.dl.dropboxusercontent.com)... 162.125.1.15, 2620:100:6016:15::a27d:10f\n",
      "Connecting to uc29796f0b776076192093df7b2d.dl.dropboxusercontent.com (uc29796f0b776076192093df7b2d.dl.dropboxusercontent.com)|162.125.1.15|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: /cd/0/inline2/CvwV8il1jZEc68KALo74AWW6KpFtSpJtE6pURwe0VPUfy3h8444UzIbiuEzJqt-nrT642eNdWpfhf0cZywophk8xT3g1EZALEaa1NWuV7sqSPm-LwY7uv1PvJW4B8Zx7iyK4zHf6rAV7Z_k6xTaSgtFmQxrrkm6LMOQE1URHDxNUa4gGU_2drLmiEQyZsgHMcN0pHGJMJVNtKTlheHDZkB2ldrqnozKIMIQWjP8f0eWjPLMXKmJtnU19XnwHIKp_cmZ4hsPa06zLovbrkei_40N0r99sfU2mgjQasv2osRfAOIBBQFKSIzJXCHct_QxeVaHSR6wveM9LS0JIK4c1FbPD1zS4NJVReDkuDXvcm23VOCheRyh8lsegV8rNRpOVZd8/file?dl=1 [following]\n",
      "--2025-08-20 09:01:30--  https://uc29796f0b776076192093df7b2d.dl.dropboxusercontent.com/cd/0/inline2/CvwV8il1jZEc68KALo74AWW6KpFtSpJtE6pURwe0VPUfy3h8444UzIbiuEzJqt-nrT642eNdWpfhf0cZywophk8xT3g1EZALEaa1NWuV7sqSPm-LwY7uv1PvJW4B8Zx7iyK4zHf6rAV7Z_k6xTaSgtFmQxrrkm6LMOQE1URHDxNUa4gGU_2drLmiEQyZsgHMcN0pHGJMJVNtKTlheHDZkB2ldrqnozKIMIQWjP8f0eWjPLMXKmJtnU19XnwHIKp_cmZ4hsPa06zLovbrkei_40N0r99sfU2mgjQasv2osRfAOIBBQFKSIzJXCHct_QxeVaHSR6wveM9LS0JIK4c1FbPD1zS4NJVReDkuDXvcm23VOCheRyh8lsegV8rNRpOVZd8/file?dl=1\n",
      "Reusing existing connection to uc29796f0b776076192093df7b2d.dl.dropboxusercontent.com:443.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 14191422 (14M) [application/binary]\n",
      "Saving to: ‘data/llama3.1_blog.pdf’\n",
      "\n",
      "data/llama3.1_blog. 100%[===================>]  13.53M  24.4MB/s    in 0.6s    \n",
      "\n",
      "2025-08-20 09:01:31 (24.4 MB/s) - ‘data/llama3.1_blog.pdf’ saved [14191422/14191422]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!wget \"https://www.dropbox.com/scl/fi/8iu23epvv3473im5rq19g/llama3.1_blog.pdf?rlkey=5u417tbdox4aip33fdubvni56&st=dzozd11e&dl=1\" -O \"data/llama3.1_blog.pdf\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c70d420d-1778-4b0d-81e2-db09276e90cf",
   "metadata": {},
   "source": [
    "![llama_blog_img](llama3.1-p5.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e29a9d7-5bd9-4fb8-8ec1-4c128a748662",
   "metadata": {},
   "source": [
    "## Initialize LlamaParse\n",
    "\n",
    "Initialize LlamaParse in multimodal mode, and specify the vendor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2e9d9cf-8189-4fcb-b34f-cde6cc0b59c8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id 5c002568-5fcb-4741-abb2-6cfe598646c1\n"
     ]
    }
   ],
   "source": [
    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    parse_mode=\"parse_page_with_lvm\",\n",
    "    vendor_multimodal_model_name=\"openai-gpt-4o-mini\",\n",
    "    # vendor_multimodal_api_key=\"fake\",\n",
    "    high_res_ocr=True,\n",
    "    adaptive_long_table=True,\n",
    "    outlined_table_extraction=True,\n",
    "    output_tables_as_HTML=True,\n",
    "    api_key=\"llx-...\",\n",
    ")\n",
    "\n",
    "result = await parser.aparse(\"./data/llama3.1_blog.pdf\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44c20f7a-2901-4dd0-b635-a4b33c5664c1",
   "metadata": {},
   "source": [
    "## View Results\n",
    "\n",
    "Let's visualize the results with gpt-4o-mini along with the original document page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "592d82bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = result.get_markdown_documents(split_by_page=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "778698aa-da7e-4081-b3b5-0372f228536f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "page_number: 5\n",
      "file_name: ./data/llama3.1_blog.pdf\n",
      "\n",
      "  \n",
      "Introducing Llama 3.1: Our most capable models to date  \n",
      "  \n",
      "\n",
      "# Category Benchmark\n",
      "\n",
      "| Benchmark                     | Llama 3.1 8B | Gemma 2 9B IT | Mistral 7B Instruct | Llama 3.1 70B | Mixtral 8x228 Instruct | GPT 3.5 Turbo |\n",
      "|-------------------------------|---------------|----------------|---------------------|----------------|------------------------|----------------|\n",
      "| General                       |               |                |                     |                |                        |                |\n",
      "| MMLU (0-shot, non-CoT)       | 73.0          | 72.3           | 60.5                | 86.0           | 79.9                   | 69.8           |\n",
      "| MMLU PRO (5-shot, CoT)       | 48.3          | 36.9           | 36.9                | 66.4           | 56.3                   | 49.2           |\n",
      "| IFEval                        | 80.4          | 73.6           | 57.6                | 87.5           | 72.7                   | 69.9           |\n",
      "| Code                          |               |                |                     |                |                        |                |\n",
      "| HumanEval (0-shot)           | 72.6          | 54.3           | 40.2                | 80.5           | 75.6                   | 68.0           |\n",
      "| MBPP EvalPlus (based on CoT) | 72.8          | 71.7           | 49.5                | 86.0           | 78.6                   | 82.0           |\n",
      "| Math                          |               |                |                     |                |                        |                |\n",
      "| GSM8K (0-shot, CoT)          | 84.5          | 76.7           | 53.2                | 95.1           | 88.2                   | 81.6           |\n",
      "| MATH (0-shot, CoT)           | 51.9          | 44.3           | 13.0                | 68.0           | 54.1                   | 43.1           |\n",
      "| Reasoning                     |               |                |                     |                |                        |                |\n",
      "| ARC Challenge (0-shot)       | 83.4          | 87.6           | 74.2                | 94.8           | 88.7                   | 83.7           |\n",
      "| GPA (0-shot)                 | 32.8          | 28.8           | 28.8                | 46.7           | 33.3                   | 30.8           |\n",
      "| Tool use                      |               |                |                     |                |                        |                |\n",
      "| BFCL                          | 76.1          | 60.4           | 84.8                |                |                        | 85.9           |\n",
      "| Nexus                         | 38.5          | 30.0           | 24.7                | 56.7           | 48.5                   | 37.2           |\n",
      "| Long context                  |               |                |                     |                |                        |                |\n",
      "| ZeroSCROLLS/QualiTY          | 81.0          |                | 90.5                |                |                        |                |\n",
      "| InfiniteBench/En.MC          | 65.1          |                | 78.2                |                |                        |                |\n",
      "| NIH/Multi-needle              | 98.8          | -              | -                   | 97.5           | -                      | -              |\n",
      "| Multilingual MGSM (0-shot)    | 68.9          | 53.2           | 29.9                | 86.9           | 71.1                   | 51.4           |\n",
      "\n",
      "# Llama 3.1 405B Human Evaluation\n",
      "\n",
      "| Comparison                                      | Win   | Tie   | Loss   |\n",
      "|------------------------------------------------|-------|-------|--------|\n",
      "| Llama 3.1 405B vs GPT-4-0125-Preview           | 23.3% | 52.2% | 24.5%  |\n",
      "| Llama 3.1 405B vs GPT-4                         | 19.1% | 51.7% | 29.2%  |\n",
      "| Llama 3.1 405B vs Claude 3.5 Sonnet             | 24.9% | 50.8% | 24.2%  |\n",
      "\n",
      "  \n",
      "https://ai.meta.com/blog/meta-llama-3-1/\n"
     ]
    }
   ],
   "source": [
    "print(documents[4].get_content(metadata_mode=\"all\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "705f7729-fa0f-4ca0-8562-c42afeaa8532",
   "metadata": {},
   "source": [
    "## Setup RAG Pipeline\n",
    "\n",
    "Let's setup a RAG pipeline over this data.\n",
    "\n",
    "(we also use gpt-5-mini for the actual text synthesis step)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a53ee5d-cc63-421b-8896-588c83edfcf0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Settings\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "\n",
    "Settings.llm = OpenAI(model=\"gpt-5-mini\", api_key=\"sk-...\")\n",
    "Settings.embed_model = OpenAIEmbedding(model=\"text-embedding-3-large\", api_key=\"sk-...\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60972d7a-7948-4ad7-89df-57004acee917",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex\n",
    "\n",
    "index = VectorStoreIndex.from_documents(documents)\n",
    "query_engine = index.as_query_engine(similarity_top_k=5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7df7bcb-1df4-4a01-88fc-2d596b1cc74d",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"How does Llama3.1 compare against gpt-4o and Claude 3.5 Sonnet in human evals?\"\n",
    "\n",
    "response = query_engine.query(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7070a31-3bb8-4134-8338-20bc2fd6f3d6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reported human-evaluation results for Llama 3.1 (405B):\n",
      "\n",
      "- vs GPT-4-0125-Preview: Win 23.3%, Tie 52.2%, Loss 24.5%  \n",
      "- vs GPT-4: Win 19.1%, Tie 51.7%, Loss 29.2%  \n",
      "- vs Claude 3.5 Sonnet: Win 24.9%, Tie 50.8%, Loss 24.2%\n",
      "\n",
      "There are no separate head-to-head human-eval numbers published specifically for GPT‑4o in the reported results.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1200c9c0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Introducing Llama 3.1: Our most capable models to date  \n",
      "  \n",
      "\n",
      "# Category Benchmark\n",
      "\n",
      "| Benchmark                     | Llama 3.1 8B | Gemma 2 9B IT | Mistral 7B Instruct | Llama 3.1 70B | Mixtral 8x228 Instruct | GPT 3.5 Turbo |\n",
      "|-------------------------------|---------------|----------------|---------------------|----------------|------------------------|----------------|\n",
      "| General                       |               |                |                     |                |                        |                |\n",
      "| MMLU (0-shot, non-CoT)       | 73.0          | 72.3           | 60.5                | 86.0           | 79.9                   | 69.8           |\n",
      "| MMLU PRO (5-shot, CoT)       | 48.3          | 36.9           | 36.9                | 66.4           | 56.3                   | 49.2           |\n",
      "| IFEval                        | 80.4          | 73.6           | 57.6                | 87.5           | 72.7                   | 69.9           |\n",
      "| Code                          |               |                |                     |                |                        |                |\n",
      "| HumanEval (0-shot)           | 72.6          | 54.3           | 40.2                | 80.5           | 75.6                   | 68.0           |\n",
      "| MBPP EvalPlus (based on CoT) | 72.8          | 71.7           | 49.5                | 86.0           | 78.6                   | 82.0           |\n",
      "| Math                          |               |                |                     |                |                        |                |\n",
      "| GSM8K (0-shot, CoT)          | 84.5          | 76.7           | 53.2                | 95.1           | 88.2                   | 81.6           |\n",
      "| MATH (0-shot, CoT)           | 51.9          | 44.3           | 13.0                | 68.0           | 54.1                   | 43.1           |\n",
      "| Reasoning                     |               |                |                     |                |                        |                |\n",
      "| ARC Challenge (0-shot)       | 83.4          | 87.6           | 74.2                | 94.8           | 88.7                   | 83.7           |\n",
      "| GPA (0-shot)                 | 32.8          | 28.8           | 28.8                | 46.7           | 33.3                   | 30.8           |\n",
      "| Tool use                      |               |                |                     |                |                        |                |\n",
      "| BFCL                          | 76.1          | 60.4           | 84.8                |                |                        | 85.9           |\n",
      "| Nexus                         | 38.5          | 30.0           | 24.7                | 56.7           | 48.5                   | 37.2           |\n",
      "| Long context                  |               |                |                     |                |                        |                |\n",
      "| ZeroSCROLLS/QualiTY          | 81.0          |                | 90.5                |                |                        |                |\n",
      "| InfiniteBench/En.MC          | 65.1          |                | 78.2                |                |                        |                |\n",
      "| NIH/Multi-needle              | 98.8          | -              | -                   | 97.5           | -                      | -              |\n",
      "| Multilingual MGSM (0-shot)    | 68.9          | 53.2           | 29.9                | 86.9           | 71.1                   | 51.4           |\n",
      "\n",
      "# Llama 3.1 405B Human Evaluation\n",
      "\n",
      "| Comparison                                      | Win   | Tie   | Loss   |\n",
      "|------------------------------------------------|-------|-------|--------|\n",
      "| Llama 3.1 405B vs GPT-4-0125-Preview           | 23.3% | 52.2% | 24.5%  |\n",
      "| Llama 3.1 405B vs GPT-4                         | 19.1% | 51.7% | 29.2%  |\n",
      "| Llama 3.1 405B vs Claude 3.5 Sonnet             | 24.9% | 50.8% | 24.2%  |\n",
      "\n",
      "  \n",
      "https://ai.meta.com/blog/meta-llama-3-1/\n"
     ]
    }
   ],
   "source": [
    "print(response.source_nodes[0].text)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
